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Abstract 

We propose MATRIX ALPS for recovering a sparse plus low-rank decomposition of a matrix given its corrupted and incomplete 
linear measurements. Our approach is a first-order projected gradient method over non-convex sets, and it exploits a well-known 
memory-based acceleration technique. We theoretically characterize the convergence properties of MATRIX ALPS using the stable 
embedding properties of the linear measurement operator. We then numerically illustrate that our algorithm outperforms the existing 
convex as well as non-convex state-of-the-art algorithms in computational efficiency without sacrificing stability. 

I. Introduction 

Finding a low rank plus sparse matrix decomposition from a set of — possibly incomplete and noisy — measurements is critical 
in many applications. The list has expanded over the last ten years: examples include MRI signal processing, collaborative 
filtering, hyperspectral image analysis, large-scale data processing, etc. A general statement of the problem under consideration 
can be described as follows: 

Problem. Given a linear operator A- : R™^" — > Rf and a set of observations y £W (usually p <S^ m X n): 

y^AX*+e, (1) 

'f ^ ' where X* :— L* + M* G R™x" ;5 f/;g superposition of a rank-k L* and a s-sparse M* component that we desire to recover, 

^; identify a matrix L G R'"X" of rank (at most) k and a matrix M G K™^" with sparsity level 11 Mil < s such that: 

^ {£, M} = argmin ||y - A(L + M)j| (2) 

O L, M: rank(L)<fc, ||M||q<s 

Here, e G K*' represents the potential noise term. For different linear operator A and signal X* configurations, the above 
^+ problem arises in various research fields. Next, we briefly address some of the frameworks that ^ is involved. 

> 

^H A. Compressed sensing and affine rank minimization 

\C In the standard Compressed Sensing (CS) framework, we desire to reconstruct a n-dimensional, s-sparse loading vector 

OO through a p-dimensional set of observations with p <^n. This problem can be solved by finding the minimizer X := M of: 

<^ r-1 I, . I, 

{M} = argmin ||y - .4M||2. (3) 

C^ M :MeB", ||M||o<s 

o 



o 



(N 



_^1 where we reserve D" to denote the set of n x n diagonal matrices. To establish solution uniqueness and reconstruction stability 

, in pi, A is usually assumed to satisfy the sparse restricted isometry property (sparse-RIP) |1] where: 

(1 - 5.(^))||X||^ < IIAXJI^ < (1 + 5s{A))\\y.\\p, (4) 



X 



VX G D" with ||X||g < s and 5s{A) G (0, 1). 

In the general affine rank minimization (ARM) problem, we aim to recover a low-rank matrix X* := L* from a set of 
observations y £W , according to ifTl. The challenge is to reconstruct the true matrix given p <^ m ■ n. A practical means to 
tackle this problem is by finding the simplest solution X := L of minimum rank that minimizes the data error as: 

{L} = argmin ||i/ - AL|| (5) 

L: rank(L)<fc 

(2) provides guarantees for exact and unique solution using the rank-RIP property for affine transformations where A satisfies: 

(1 - 4(^))||X||^ < ||AX||^ < (1 + 4(A))||X||^, (6) 

VX G R""'""- with rank(X) < k and 4(A) G (0, 1). 

B. Fusing low-dimensional embedding models 

Robust Principal Component Analysis (RPCA) deals with the challenge of recovering a low rank and a sparse matrix component 
from a complete data matrix. In mathematical terms, we acquire a finite set of observations Y G R™^" according to Y = 
L* + M* with L* G R™^" and M* G R™^", defined above. The "robust" characterization of the RPCA problem refers to 
M* having gross non-zero entries with arbitrary energy. Under mild assumptions concerning the incoherence between L* and 
M* 1 3 1, we can efficiently reconstruct both the low-rank and sparse components using convex and non-convex optimization 
approaches (3), (4|. 



1: Input: y, A^ .4.*, Tolerance rj, Maxlterations 


2 


Initialize: {Lo,Mo} ^ 0, {Co, Mo} ^ {0}, i ^ 


3 


repeat 


4 


Sf- ^ of U Ci where Of ^ ortho (Pfc(V/(XO)) 


5 


5,^ ^ 2>f UXi where Vf^ ^ supp (Pe,(V/(XO)) 


6 


Low rank matrix estimation: 


7 


Vf ^ argminv:V6span(5,^) lly - -^(V + MOJI2 


8 


U+i <~ Pfe(Vf ) with A+i ^ ortho (L,+i) 


9 


Sparse matrix estimation: 


10 


Vf" ^ argminv,vesupp(s,-M) ||y - ^V + U)\\l 


11 


M,+i ^ pE,(Vf ) with >1,+i <- supp (Mi) 


12 


X,;+i ^ L, + i + Mi + i 


13 


i <— i + 1 


14 


until 1 Xi — Xi_i| 2 < J7I Xi|l2 or Maxlterations. 



Algorithm 1: SpaRCS 



C. Contributions 

While solving the RPCA problem itself is a difficult task, here we assume: (i) A. is an arbitrary linear operator satisfying both 
sparse- and rank-RIP (this assumption includes the identity linear map of RPCA as a special case) and, (ii) the total number of 
observations in y is much less compared to the total number of variables we want to recover, i.e., p <^ m- n. Our contributions 
are two-fold: 

• For noisy settings and arbitrary operator «4. satisfying sparse- and rank-RIP, we provide better restricted isometry constant 
guarantees compared to state-of-the-art approaches |5|. 

• We introduce MATRIX ALPS, an accelerated, memory-based algorithm along with preliminary convergence analysis. 
The organization of the paper is as follows. In Section III] we describe the algorithms in a nutshell and present the main 

theorem of the paper in Section [In] In Section [TV] we briefly study acceleration techniques in the recovery process. We provide 
empirical support for our claims for better data recovery performance and reduced complexity in Section [V] 

Notation: We reserve lower-case letters for scalar variable representation. Bold upper-case letters denote matrices while bold 
calligraphic upper-case letters represent linear maps. We reserve plain calligraphic upper-case letters for set representations. 
We denote a set of orthonormal, rank-1 matrices that span the subspace induced by X as ortho(X). Given a matrix X and a 
subspace set 5 such that span(5) C span(ortho(X)), the orthogonal projection of X onto the subspace spanned by 5 is given 
by PsX while P5 X represents the projection onto the subspace, orthogonal to span(5). Given a matrix X and an index set U, 
(X.)u denotes the (sub)matrix of X with entries in U while pCju" denotes the (sub)matrix of X with entries in the complement 
set of W. The best s-sparse and rank-fc approximations of a matrix X are given by Vs^^X.) and Pfc(X), respectively. For any 
two subspace sets Si, S2, we use the shorthand PsiXSg '° denote the projection onto the subspace defined by Si, orthogonal 
to the subspace defined by 52 — similar notation is used for index sets. We use Xi G t^^x" jq represent the current matrix 
estimate at the i-th iteration. The rank of X is denoted as rank(X) < min{m, n} while the non-zero index set of X is given by 
supp(X). The empirical data error /(X) :— \\y — .4X||2 has gradient V/(X) := ~2A*{y — ^X), where A.* is the adjoint 
linear operator. I represents the identity matrix. 

II. The SpaRCS Algorithm 

Explicit description of SpaRCS |5 1 is provided in Algorithm 1 in pseudocode form. This approach borrows from a series of 
vector and matrix reconstruction algorithms such as CoSaMP |6| and ADMiRA J7|. In a nutshell, this algorithm simply seeks 
to improve the current subspace and support set selection by iteratively collecting extended sets S^ and S{^ with \Si\ < 2k 
and \Si\ < 2s, respectively. Then, s-sparse and rank-fc matrices are estimated to fit the measurements in these restricted 
subspace/support sets using least squares techniques. 

III. Improved Convergence Guarantees 

Before we present our analysis, we note the following. The reconstruction of both L* and M* from y makes sense under 
mild conditions on L* and M*. Borrowing from |3|, we assume that the low rank component L* is not sparse and uniformly 
bounded with respect to its singuar vectors and the sparse compoment M* is not low rank with support set uniformly random 
over the entries of M*. 

An important ingredient for our matrix analysis is the following lemma — the proof can be found in |5]. 

Lemma 1. Let J- be a support set with \J-\ < s and assume L G jj^ix" ^^ ^ rank-k matrix, satisfying the conditions above. 
Then, given a general linear operator A : R™^" — > R^ satisfying both sparse- and rank-RIP, we have: 

j|(A*^L)jrj| < 5s+fc(«4)||L|| /ormin{m,n} > s > fc. 



where 5s+fc(,4.) denotes the RIP constant of ^ over (disjoint) sparse index and low-rank subspace sets where the combined 
cardinality is less than s + k. 

We provide improved conditions for convergence for Algoritiim 1. The details of the proof can be found in the Appendix. 
The following theorem characterizes Algorithm 1: 

Theorem 1. Given the problem configuration described in |7l and (pi, assume the linear operator ^ satisfies the sparse-RIP 
and rank-RIP for (54s («4.) < 0.075, 54,k{^ < 0.04 and <523+3fc(«4) < 0.07. Then, the (i + l)-th matrix estimate X^+i of 
Algorithm 1 can be decomposed into a superposition of low-rank and sparse components as X^+i — Li+i + M^+i, satisfying 
the recursions: 



1|L* - L,+i||^ < pf ||L* - L,||^ + pf ||M* - M,l|^ + 71 



e 



p^HlW'-' '-'iWp^Hl w-""- -^"MIf ^ '-^11^112 

||M* - M,+i||^ < pf ||L* - U\\p + P^\\M.' - M,||^ +72||e||2 
where pf = 0.1605, p^ = 0.3431, p^ = 0.3376, p^ = 0.1414, 71 = 4.36 and, 72 = 4.45. 

To compare with state-of-the-art approaches, [51 provides the following constants for the same RIP assumptions: pf — 0.479, 
p^ = 0.474, pf = 0.47, pf = 0.324, 71 = 6.68 and, 72 = 6.88. We note here that the above theorem holds if and only if 
the intermediate estimates Li and Mi, Vi, satisfy Lemma [T| Unfortunately, we cannot guarantee that Li and Mi are uniformly 
bounded or have random support set patterns, respectively, at each iteration for arbitrary problem configurations. Although the 
potential optimization problem is non-convex, recent works on non-convex optimization |8|, |9| establish mild conditions on the 
objective function and the regularization terms, that are satisfied in our setting, under which a stationary point to a non-convex 
problem can be obtained using memory-less or memory-based projected gradient descent methods. 

Next, we sketch the proof of Theorem[T]in a modular fashion and use key ingredients to analyze our Matrix ALPS algorithm. 

A. Subspace and support exploration 

Lemma 2 (Active subspace expansion). At each iteration, the Active Subspace Expansion step (Step 4) captures information 
contained in the true matrix L* with C* <— ortho(L*), such that: 



||P£.\<;r (L* - LOll^ < {252k{A) + 25sk{A))\\L' - U\\p + 252k+2s{A)\\M' - M,||^ + ^/2(lV5^^(A})\\e\\^. 

Lemma I2I states that, at each iteration, the Active subspace expansion step identifies a 2k rank subspace in K™^" such that 
the amount of unrecovered energy of L* — i.e., the projection of L* onto the orthogonal subspace of span(5i ) — is bounded as 
shown above. Similarly, the next Corollary holds for the sparse estimation part: 

Corollary 1 (Active support expansion). At each iteration, the Active Support Expansion step (Step 5} captures information 
contained in the true matrix M* with M* <— supp(M*), such that: 

II (M* - M,)^.^^^ 11^ < {52s{A) + 5is{A))\\M' - M,||^ + {52k+s{A) + 52k+2s(A))\\L* - L,||^ 
+ \/2(l + <54s(A))||e||2. 

B. Least-squares estimates over low rank subspaces 

Lemma 3 (Least-squares error norm reduction over a low-rank subspace). Let Sf be a set of orthonormal, rank-1 matrices 
such that span [Si ) < 2k. Then, the rank-2k solution Vi in Step 7 identifies most of the energy of C* over Si such that: 

l|vf - Ll^ < -T^^^l^ll^if (Vf - L*)L + ^\t^|^'(^^^^ ('^2fe+2.(.4)||M* - M.||^ + ^l + 52k{A)\\e\ 

Assuming A is well-conditioned over low-rank subspaces, the main complexity of this operation is dominated by the solution 
of a symmetric linear system of equations. Using Lemma [3] and the following inequality: 

IIL.+i - Vf IIf < ||Psf (Vf - L*)||^ < ||vf - L*||^, 
which is due to the best rank-fc subspace selection on Vf (Step 8), the following inequality holds true: 



|L^+i - LIU < V 1 -'IfX^ "^^"^^^^ - L')L + (v/l + ^5UA) ■ \t'4f^^ + ^) (W2.(A)||M- - M. 



If 



+ v/l + 52s(A)||e||2). (7) 

Combining Lemma [2] with the inequality JtI, we obtain the first inequality in Theorem 1. 



Input: y, .4, .4.*, Tolerance i], Maxlterations, r^, Vi 
Initialize: {Qo,Mo,Lo} ^ 0, {Co, Mo} ^ {0}. i ^ 
repeat 
Low rank matrix estimation: 

Of ^ortho (n(V/(QO)) 

sf ^ vf u Ci 

Vf ^Qf-^p5rV/(Q0 

Li+i ^ ■pfe(Vf ) with Ci+i ^ ortho (L,+i) 

Qi+i <— Li+i + ri(Li+i — Li) 

Q.+i ^ Qf+i + Qf^ 

Sparse matrix estimation: 

Of^^supp (Pe,(V/(Q,+i))) 

M,+i ^ PE,(Vf^) 'with X,+i ^ supp (M»+i) 
Q^i ^ M.+i + r,(M,+i - MO 

Qi+i <— Qi+1 + Qi+i 
i ^— i + 1 
until ||Xi — Xi_i||2 < r?|lXi||2 or Maxlterations. 



Algorithm 2: Matrix ALPS Instance 



C. Least-squares estimates over sparse support sets 

Using similar techniques descibed above for the sparse matrix estimate, we derive the following result: 

Corollary 2 (Least-squares error norm reduction over sparse support sets). Let S(^ C {(i, j) : i £ {1, . . . , m}, j G {1, . . . n}} 
be a 2s-sparse index set. Then, the 2s-sparse matrix V^ (Step 10) identifies energy of M* over Si such that: 



1 lU^rM ^,,^ II , (1 + 252.(A)) 



llvf^ - Mi^ < ^^ _ ^,^^^^^ ||(vr - M*)^^^^, J^ + ^ ^^ ^^^1^' (^3.+2.(^)||L' - L,||^ + Vl + SssiA)\\el). 
In sequence, we follow the same motions to obtain an inequality analogous to (ITJ for the sparse matrix estimate part. 

IV. The Matrix ALPS Framework 

To accelerate the convergence speed of SpaRCS, we propose MATRIX ALPS algorithm based on acceleration techniques from 
convex analysis )I0| , (Il| . At each iteration, we leverage both low rank and sparse matrix estimates from previous iterations to 
form a gradient surrogate with low-computational cost. Then, we update the current estimates using memory to gain momentum 
in convergence as proposed in Nesterov's optimal gradient methods. A key ingredient is the selection of the momentum term 
r — constant and adaptive momentum selection strategies can be found in |II| . We reserve the analysis for the adaptive case for 
an extended paper 

To further improve the convergence speed, we replace the least-squares optimization steps with first-order gradient descent 
updates — the step size p^, fif^ selections follow from fill. 

The best projection of an arbitrary matrix onto the set of low rank matrices requires sophisticated matrix decompositions such 
as Singular Value Decomposition (SVD). Using the Lanczos approach, we require 0{kmn) arithmetic operations to compute 
a rank-fc matrix approximation for a given constant accuracy — a prohibitive time-complexity that does not scale well for many 
practical applications. Alternatives to SVD can be found in J4), (12| . Furthermore, (I3| includes e-approximate low rank matrix 
projections in the recovery process and study their effects on the convergence. 

The following theorem characterizes Algorithm 2 for the noiseless case using a constant momentum step size selection strategy. 

Theorem 2. Let A. : R™^" -^ M.^ be a linear operator satisfying rank-RIP and sparse-RlP with constants 54fc(A) < 0.09 and 
(54s (^) < 0.095, respectively. Furthermore, assume constant momentum step size selection with Ti = 1/4, Vi. We consider the 
noiseless case where the set of observations satisfy y = ^X* for X* := L* + M* as defined in PROBLEM. Then, Algorithm 
2 satisfies the following second-order linear system: 

x(i + l) < (l + r)Ax(i) + rAx(J- 1), (8) 



where x(i) := 



||L,-L'|| 
iMi-M* 



and A : = 



All Ai2 
A21 A22 



depends on RIP constants 54fc(«4.) and S4,s{A). Furthermore, the 
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Fig. 1. Comparison table for the matrix completion problem. Table depicts median values over 50 Monte-Carlo iterations. To separate the 
results, we use "/". The list of algorithms includes: SpaRCS |5| / ALM |14J / GROUSE fTs] / SVP L16J / MATRIX ALPS. Bold numbers 
highlight the fastest convergenvce in execution time. "— " denotes either no mformation or noTapplicableaue to slow convergence. 



above inequality can be transformed into the following first-order linear system: 



w(i + 1) < 



(l + r)A rA 
I 



w(0), 



(9) 



for w(j) :— [x(i + 1) x(z)]"^. We observe that limi_>oo 'w(j) — since |Aj(A)| < 1, \/j. 

Due to space constraints, we reserve the proof as well as the noisy analog of Theorem 2 for an extended version of the paper. 

V. Experiments 

Robust matrix completion Q The rank-fc X* G R™^'" is synthesized as X* — UR^ where U G »"'<''' and R G »"'<'= 
and j|X*j| = 1. We subsample X* by observing p = O.Smn entries, drawn uniformly at random. The set of observations 
satisfies: y = ^nX* + e. Here, Q, denotes the set of ordered pairs that represent the coordinates of the observable entries and 
An denotes the linear operator (mask) that subsamples a matrix according to Q,. 

We generate various problem configurations, both for noisy and noiseless settings. All the algorithms are tested for the same 
signal-matrix-noise realizations and use the same tolerance parameter i) — 10~*. For fairness, we modified all the algorithms so 
that they exploit the true rank. For low-rank projections, we use PRO PACK package 1 17 1, except 1 15] which is SVD-less. We 
changed the maximum number of cycles in 1 15| from 150 to 30 to improve its speed. A summary of the results can be found 
in Fig. 1. We observe that MATRIX ALPS has better phase transition performance. A complete comparison using randomized, 
low-rank projection schemes in MATRIX ALPS is provided in the extended paper. 

RPCA: We consider the problem of background subtraction in video sequences: static brackground scenes are considered 
low-rank while moving foreground objects are sparse data. Using the complete set of measurements, this problem falls under the 
RPCA framework. We apply the GoDec algorithm |4| and the MATRIX ALPS scheme on a 144 x 176 x 200 video sequence. 
Both solvers use the same low-rank projection operators based on randomized QR factorization ideas HI, |12| . Representative 
results are depicted in Fig. 2. 



VI. CONCLUSIONS 

We study the general problem of sparse plus low rank matrix recovery from incomplete and noisy data. In essence, the problem 
under consideration includes various low-dimensional models as special cases such as sparse signal reconstruction, affine rank 
minimization and robust PCA. Based on this algorithm, we derive improved conditions on the restricted isometry constants that 
guarantee the success of reconstruction. Furthermore, we show that the memory-based scheme provides great computational 
advantage over both the convex and the non-convex approaches. 

VII. Appendix 
A. Proof of Lemma [2] 

Given C* := ortho(L*), we define the following quantities: Si :— d U Vf , Sf :— d U C*. Then: 



r 



sf\sf 



''^■vf\{c'uCi)^ 3"d ■Ps/;\sc -p£.\(i5£u£i)- 



(10) 



Since the subspace defined in ©f is the best rank-fc subspace, orthogonal to the subspace spanned by d, the following holds 
true: 



p^r\£^v/(xo||^ > ||p£.\£,v/(x. 



|2 

If 



\PscVf[^.)\\l>\\PscVf[^^)\\l 



(11) 



'Codes are available for MATLAB at http://lions.epfl.ch/MatrixALPS 
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Fig. 2. Background subtraction in video sequence. Median execution times over 10 Monte-Carlo iterations. GoDec: 34.8 sec — MATRIX 
ALPS: 15.8 sec. 



Removing the common subspaces in Sf and Sf, we get 

\\rsc^^§r.A*A(L* - Li) + Vgc^gcA* AiM* -Mi) +rgr.\scA'"e\\^ > 

llVgc^^cA'AiL* - Li) + Pgc-^gcA* AiM* - Mi) + Pgc-^gcA* e\\ ^ 
On the left hand side, we have: 

\\rsf\§f^*^i^* - U)+PsC\§cA*A{M' - M,)+Ps^^gcA*e\\p 

< \\rgc.§cAA{L''-Li)\\+\\rsc.§cAe\\+\\rgc.§cAA{M- M 



■s, V-s. 



5f \5f- 



(12) 



(13) 



(14) 



where (i) due to triangle inequality over Frobenius metric norm. The first two terms in the above expression can be bounded 
using tools in 13 . For the third term, we use Lemma 3.2 in |5| where we conclude that ||p5z;y5£.4.*^(M* — Mi)|| < 



'J2k+2. 



;(A)i|M*-MJ| Thus: 



\\Vgc^scA'A{l.' ~'Li)+Vsc-^gcA*A{M* -M.i)+Vgc.-^scA*e\\p (15) 

< 253k{A)\\L* - U\\p + &2k+2s{A)\\m' - M,||^ + \\Vsc\s^A'e\\p (16) 
Similarly, using ideas from (13| for the right hand side, we calculate: 

||p5£\5c«4*A(L* - L,) + Vsc^s^A'AiM." - m^) + Vgc^s^A'sW^ (17) 

> ||P5£\5£(L* - LOll^ - 252k{A)\\L* - U\\p - 52k+2s{A)\\M.'' - M.||^ - \\VsC\s^A*e\\p (18) 
Combining the above inequalities, we get: 

< [252k{A) + 2&ik{A))\\L' - U\\p + 252k+2s{A)\\M.'' - M,||^ + \\P^sc\s'^)^{s'^■^s^A''A\F (1^) 



< {252k(A) + 253;=(^))||L* - L,j|^ + 2S2k+2s{A)\\M.* - M,||^ + ^2{1 + 52k{A))\\£\\^. (20) 

To prove Corollary [T] we follow the same ideas based on (6), |13| , (18| . 

B. Proof of Lemma^ 

We observe that V"" — L „ is decomposed as follows: 



If 



||vf - L*||; = ||p5f (vf - L*)||; + ||pi-,(vf - L')||;. (21) 

Vf is the minimizer over the low-rank subspace spanned by Si with rank(span(5/")) < 2k. Using the optimality condition 
over the convex set Q — {X : span(X) G 5, }, we have: 

{V/(Vf ),p5r(L* - vf)) = ^ (AVf -{y- AMO, Ap5£(Vf - L*)) = 0. (22) 

for VgcJf G span(5/'). Given condition i22i, the first term on the right hand side of Kll becomes: 



1 1 Vsc ( vf - L* ) j I ^ = (Vf - L* , Vsc ( vf - L* )) (23) 

= (Vf - L*, P^r (Vf - L*)) - (AVf - (y - AM,),AVsc (Vf - L*)) (24) 

= (Vf - L*, P^r (Vf - L*)) - (AVf - (A(L* + M*) + e - AM,), ^P^^ (Vf - L*)) (25) 
= (Vf - L*,p5r (Vf - L*)) - (Vf - L* - (M* - MO, A*Ap5z:(Vf - L*)) 
+ (e,Ap5r(Vf-L*)) (26) 

= (Vf - L*, (/ - A*A)Vsc (Vf - L*)) + (M* - M.,,A*AVsc (Vf - L*)) 
+ (e,Ap5r(Vf-L*)) (27) 

< |(Vf - L*, (J - A*A)Vs^{^f - L*))l + 1(M* - M.,,A'AVsc{Vf - L*))| 
+ (e,Ap5r(Vf-L*)) (28) 

According to Lemma 10 in fill, we know that: 

|(Vf - L*, (/ - A*A)Vs^(y^ - L'))l = l(Vf - L*, (/ - P^r^^. A^AP^r^^O^sf (Vf - L*))| 

<53fe(A)||vf-L*||^||p5£(Vf-L*)||^ (30) 

given the facts that Vf - L* G span(5f U £*) and thus VsCyjj^,(Vf - L*) = Vf - L* and V^c^j^^V^l = Vgc since 
span(5,f) C span(5f U£*). The last inequality is due to Lemma 3 in 1 131. Focusing on the term |(M* — M;, ^.'^.^^^(Vf — 
L*))|, we derive the following: 

|(M* - M,,A'APsc{Vf- - L*))l = l(Ps£(Vf - L*),^^^ AA(M* - M,))! (31) 

<\\rs^{Vf-L*)\\JrsfAAiM-'-M,)\\p 

tC 



< hP5r(Vf-L*)L 52fc+2.(-^) M*-MJL (32) 



using Lemma 3.2 in (5). Then, (|28[) becomes: 



\Psc{Vf ~L*)\\l < 5,,{A)\\vf -L'\\JVsc{V^ -L')\\^ + S2,+2s{A)\\Vsc{V^ -L*)\\JM' -M,\\^ 



+ v/l + <52fc(A)||p5z:(Vf -L*)||^||e||2, (33) 

where the last terai becomes using Lemma 1 in 1131. Simplifying the above quadratic expression, we obtain: 

||p5r(Vf-L*)||^<53fc(A)||vf-L*||^ + d-2fc+2,(A)l|M*-M,||^ + 0TMA)||el|2. (34) 

As a consequence, (l2TJ can be upper bounded by: 

||vf - L*j|^ < {53k{A)\\vf - L*||^ + 52k+2s{A)\\M' - M,||^ + ^/lTh^{A)\\e\\.f + \\V^c(yf - L*)||^. (35) 

We form the quadratic polynomial for this inequality assuming as unknown variable the quantity ||V~ — L*|| . Bounding 
by the largest root of the resulting polynomial, we get: 

II Vf - L*||^ < l.^J l'Psd^f - L')L + ^^^^ly#('52fe+2.(A)l|M* - M.||^ + ^^Th;(A)\\e\ 

C. Proof of Inequality (jTl 

Proof: We observe the following 

||L,+i - L*||^ = ||L,+i - vf + vf - L' 11^ (36) 

= ||(Vf-L*)-(Vf-L.+i)||^ (37) 

= ||vf - L-JI" + ||vf - L,+i||' - 2(Vf - L*, vf - L,+i). (38) 



Focusing on the right hand side of expression 1 38 ', (Vf — L*, Vf — L^+i) — (Vf — L*,Pg/;(Vf — Li+i)) can be similarly 
analysed as in l |30[ l. Using the optimality cond tion as in \22\ , we obtain the following expression: 

|(Vf - LWsfiV^ - L,+i))| = |(Vf - L*,p5£(Vf - L,:+i)) - (Vf - L* - (M* - M,),A'AVsciV^ - L.+i)) 

+ {e,APsf{V^-U+i)}\ 

= |(Vf - L*, (/ - A*A)VsciVf ~ U+i)) + (M* - M„A*AVsf{Vf - L.+i)) 

+ (e,^p5r(Vf-L, + i))| 

< <53fc(A)||vf - L*||^||vf - L,+i||^ + \\VsfA'A{M* - M,] 
+ \/l + 52fc(A)||e||Jvf - L,+i||^ 

< 53kiA)\\V^ - L*||^||vf - L,+i||^ + 52k+2siA)\\M'' - M,j 
+ \/rTM^j|£||Jvf -L,+i||^. 

Now, expression l |38[ ) can be further transformed as: 

||L,+i - L* II' = II vf - L* II' + ||vf - L.+i||' - 2(Vf - L*, vf - L.+i) 



If II » llF 

|2 

If 



< ||vf - L* 11^ + ||Vf - L.+i||; + 2|(Vf - L*, Vf - Lh 

< ||vf - L*||' + ||vf - L.+i||' + 2(53fc(^)||vf - L*|L ||vf - L,+i| 



fII ^» 


— Li+i 


IIf 


l|vf- 


Li+i|| 


F 

(39) 

(40) 
(41) 



+ S2k+2s{A)\\M'' - M.||^||vf - L.+i||^ + v/l + <52fc(A)||e||Jvf - L,+i||^.) (42) 

where (i) is due to (|39l. Using the inequality HL^+i — Vf ||f < ||p5c(Vf — L*)|| , we get: 



|L,+i-L*||^<||vf-L*||^ + ||P5^(Vf-L*)||^ + 2(53fe(^)||vf-L*||^||P5r(Vf-L*)||^ 



+ 52fc+2.(A)||M*-M.||^||p5r(Vf-L*)||^ + yrTMA)||e||Jp5r(Vf-L*)||^.) (43) 
Furthermore, replacing ||p^c(L* — Vf)|| with its upper bound defined in |34l, we compute: 

||L,+i - L*||^ = (1 + 353\(A))||vf - L*||| + 453fc(.A)||vf - L*||^(52fc+2.(A)||M* - M,||^ + ^1 + 52k{A)\\s\\^) 



2 

F ' '-"o'^\' -/\\ • I ■" II ^\ 

)' 
F' 



+ 3(\/l + <52fc(^)||e||, + <52fc+2.(A)||M* - M, 



< (l + 353\(^)) M|vf - L*||^ + J ^—^^-^^{52k+2MA)\\M' - M,||^ + v/l + 52fe(A)||£||) j 



where (i) is obtained by completing the squares and eliminating negative terms. Thus: 



||L,+i -L*||^ < ^l + 3Sl,{A)l\\vf - L*||^ + J^^^^-^(52fe+2^(^)||M' -M.||^ + ^ ^ + S2kiA)\\e\\) 
Furthermore, we exploit Lemma [3] to obtain inequality jTl. ■ 

D. Proof of Theorem p] 

Here, we prove the convergence of Algorithm 2, both for the low rank and the sparse matrix estimate part, and then combine 
the corresponding theoretical results. Let £* <— ortho(L*) be a set of orthonormal, rank-1 matrices that span the range of L* 
and M* be the set of indices of the non-zero elemetns in M*. For the low rank matrix estimate, we observe the following: 

(44) 

|L,+i-L*+L*-VM|;< ||L*-VM|;^ (45) 



L,+i-Vf||^< 


|L* 


-vf 


|2 

If 


+ L*-Vf||^< 


|L* 


-vf 


|2 

If 


L*,L*-Vf)< 


|L* 


-Vf 


|2 

If 



||L.+i - L*l|^ + l|vf - L* 11^ + 2(L,;+i - L*, L* - Vf ) < ||L* - Vf ||; ^ (46) 

||L.+i - L*||^ < 2(L.+i - L*, vf - L*) (47) 

From Algorithm 2, it is obvious that (i) Vf £ span(5f ), (ii) Qf G span(5~) and [Hi) Li+i £ span(5f'). We define 
£ :— Si U C* where rank(span(f )) < 4fc and let Ve be the orthogonal projection onto the subspace defined by £. We highlight 
that -pfPgz; ='^sf- 

Since L,+i — L* G span(f ) and Vf — L* G span(f ), the following hold true: 

U+i - L* = Ts{U+i - L*) and Vf - L* = p£(Vf - L*). (48) 
Then, \41) can be written as: 

j|L«+i-L*||^ (49) 

<2{p£(L,+i-L*),p£(Vf-L*)) (50) 

= 2{Ve(U+i - L*),p£ (qf + fifTscA^iy - AQO - L*)) (51) 

= 2(L,+i - L*,p£(Qf - L*) + fifPePsf (A*(A(L* + M*) - AQ,))) (52) 

= 2(L,+i - L*,p£(Qf - L*) + t^fVeVsf (^*A(L* + M*) - A*A(Qf + Qf )) (53) 

= 2(L.+i - L*,p£(Qf - L*) - t^fVeVsfA'AiQ^ - L") - fi^VsVscA'AiQt^ - M*)) (54) 
= 2(L.+i - L*,p£(Qf - L*) - iifPeVsfA'APEiQf' - L*)) - 2Atf (L,+i - L* ,PerscA'-A{Qt^ - M*)) (55) 
= 2{L,:+i - L*,p£(Qf - L*) - fifPeVsfA'AlPsf +r^c]r£iQf - L*)) 

- 2Mf (L.+i - L\r£rscA*AiQt^ - M*)) (56) 
due to VsiQf- - L*) := VgcVeiQi - L*) + P^cPfilQf - L*). The first term in Jsel salsifies: 

2{L,+i - L*,p£(Qf - L*) - ^fPfiPsf -^*-^[^sf + ^sf ]^£(Qf - L')> 
< 2||L,+i - L*||^||(/ - Mfp£p5r^*Ap5£)p£(Qf - L*)||^ + 2^f ||L,+i - L*||^||p5r A*A7'ir7'£(Qf - L*)||^ 

4^3^ (A) II ,,, II £ ,|| 2(?4fc(>4.) ^^ CJ^^LOA I'J-l^IlT T'll IIO'C T*ll (-=;7^ 

1 -(? (A) W '^^ ~ IIfII^' ^ -^ 1^ 1 -g (A) + 2.dAk{A))\\L,^+i -L ||^||14, -1^ 11^ (57) 

where \5l\ holds, since ^ , ^ •*" ,_^. < /if < -^_^ f ._^, , using Lemma 3 in (13): 

KI-tifTscA*AVsc)e 
and thus: 



l + <53fc(A)' l-53fc(^) 



^ I^m^ 



||(J-Mf^5z:A*Ap5^.:)p£(Qf-L-)j|^< ^^^|^j|P£(Qf-L*)||^. (59) 

Furthermore, according to Lemma 4 in |13]: 

\\VscA*AV^cVe{Qf--L*)\\^ < S^kiAMlV^cVsiQf ^L*)l\^ (60) 
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since mnkiVsus^Q) < 4fc, VQ £ 7^'"''". Moreover: 



|^s/:^£(Q»'' - L*)|U < (a-Ja^-A) + 2Sik{A))\\Qf - L*j|^, 



(61) 



using ideas from Lem ma \2\ 

The second term in (|56ll salsifies: 



2^^t{U+^^-L\r£rsfA*AiQ^ - M*)) < - 

< 



53k{A) 
2 



\U+i-L'\\ WVscA'AiQi 



M*) 



using Lemma 3.2 in p). Replacing the above results in l|56|>, we compute: 



U+i-L'\\jak+3s{A)\\Cir-M'\ 



\U+i - L*||^ < a\\Qf - L*||^ + /3||Qf - M*||^, 

,(-A) 



(62) 



i-l^f^) + i-i3\'(!i) (2^3fc(-4.) + 254k(A))) and ^ := ^^i^+jl'^^ . Following similar steps for the sparse matrix 

(63) 



where a : — 

estimate part, we end up with the following inequality boiind for Mi+i 

||M,+i - M'll < 7||Qf - M*|L + CilQf - L* 



where 7 := '''^'/1'+fj.V^» and C := ^fi''+^°'-^' 



Furthermore: 



-^3={A) 



|Qf-L*| 



and 



p — ||Li + Ti(Li — Li_i) — L*||^ 

= 11(1 + r,)(L, - L*) + r,(L* - L,_i)||^ 
< (1 + ri)||Li - L*|| +ri||Li_i — L*j| 



iQf^ - M*|L = llM, + r.(M. - Mi_i) - M*| 



(64) 



= 11(1 + Ti){M, - M*) + r,{M' - M,_i)||^ 

< (1 + rO||M, - M*||^ + r,||M._i - M*||^ 

Combining l |64[ ), l |65[ l into l |62[ ( and l |63[ l, we get: 

||Li+i — L*||^ < a(l + ri)||Li — L*||^ + ari||Li_i — L*||^ 

+ /3(1 + rOJ|M, - M*j|^ + /3r,j|M,_i - M* 11^ 

and 

||M,+i -M*||^ <7(l + ri)l|Mi-M*l|^+7ri||Mi_i-M*l|^ 

+ C(l + T-0||Li-L*||^ + Cr,,||Li_i -L*||^ 

The inequalities (|66j and l[67} define the following coupled set of inequalities: 

||Li-i — L*|| 
|Mi_i -M* 



||Li+i — L*jL 



where A : — 



a 

C 7 



< (1 + rOA 
Furthermore, we define x(i) : — 



I|L.-L1L 
|M,-M*||^ 



nA 



(65) 



(66) 



(67) 



(68) 



to obtain inequality 181. We can convert this second- 



_|M,-M*||^_ 

order linear system into a two-dimensional first-order system where the variables of the linear s'}?stem are multi-dimensional. To 
achieve this, we define a new state variable y(j) where: 

y(i):=x(J-)-l). (69) 

and thus, y(z + 1) := x(i + 2). Using the new variable above, we define the following two-dimensional first-order system: 



y(i + 1) - (1 + n) Ay(i) - r,Ax(i) < 0, 
x(i + l) <y(i). 

which, moreover, defines the following linear system that characterizes the evolution of two state variables, {y(j),x(i)}: 



y{i + 1)' 

x(i + l) 



(1 + n)A nA 
I 



x(^) 



x(i + 2)' 
x(i + l) 



(l + rOA r,A' 
I 



x(i + 1) 
x(i) 



(70) 



(71) 



with well-defined initial conditions x(0) := 
obtain the linear system: 



|L* 
M* 



w(i + l) < 



and y(0) — x(l) = (1 + ri)Ax(O). For w(i) 

I ^W- 



x(i + 1)' 
x(i) 



, we 



(72) 



Unfolding the recursion, we get the inequality l|9l 



w(i + l) < A w(0). 



(73) 



Assuming A. : R™^" — )• R'' is a linear operator satisfying rank-RIP and sparse-RIP with constants S4k{A.) < 0.09 and 
54s{A.) < 0.095, respectively, and satisfies jointly the low rank- and sparse-RIP with constant S-^k+Ssi-A.) < 0.095, we observe 
that the eigenvalues of A are distinct and real and satisfy |Aj(A)| < 1, Vj. Furthermore, jl— Aj ^ 0. To complete the proof, 
we use the following Theorem from (19| — the proof is omitted: 

Theorem 3 (Necessary and Sufficient Conditions for Global Stability: Distinct Real Eigenvalues). Consider the system w(i+l) = 
Aw(j) + B where w(0) is given. We assume that |I — A| 7^ and A has distinct real eigenvahies. Then: 

• The steady-state equilibrium w = [I — A]^^B is globally stable if and only if |Aj(A)| < 1, Vj. 

• linii^oo w(i) = w if and only if |Aj(A)| < 1, Vj. 

In our simple case, we consider B := 0. Thus, the steady-state equilibrium in ^ satisfies w = 0. Then, we conclude 
limi_>oo w(J) = and, thus: 



|L. -L*| 



and 



Mi -M* 



0, 



(74) 
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